Text Content Reliability Estimation in Web Documents: A New Proposal
نویسندگان
چکیده
This paper illustrates how a combination of information retrieval, machine learning, and NLP corpus annotation techniques was applied to a problem of text content reliability estimation in Web documents. Our proposal for text content reliability estimation is based on a model in which reliability is a similarity measure between the content of the documents and a knowledge corpus. The proposal includes a new representation of text which uses entailment-based graphs. Then we use the graph-based representations as training instances for a machine learning algorithm allowing to build a reliability model. Experimental results illustrate the feasibility of our proposal by performing a comparison with a state-of-the-art method.
منابع مشابه
Arabic News Articles Classification Using Vectorized-Cosine Based on Seed Documents
Besides for its own merits, text classification (TC) has become a cornerstone in many applications. Work presented here is part of and a pre-requisite for a project we have overtaken to create a corpus for the Arabic text process. It is an attempt to create modules automatically that would help speed up the process of classification for any text categorization task. It also serves as a tool for...
متن کاملA survey on Automatic Text Summarization
Text summarization endeavors to produce a summary version of a text, while maintaining the original ideas. The textual content on the web, in particular, is growing at an exponential rate. The ability to decipher through such massive amount of data, in order to extract the useful information, is a major undertaking and requires an automatic mechanism to aid with the extant repository of informa...
متن کاملنقش ارتباطات معنایی در بهبود نتایج یک سیستم پیشنهاد استناد- مقاله برگزیده هفدهمین کنفرانس ملی انجمن کامپیوتر ایران
With the increasingly growth of scientific documents in the Web, it is difficult to select a concerned document. A citation recommendation system receives a text and recommends documents to be cited by the text. Such recommendation helps a researcher in hitting his/her concerned texts. Based on sematic relations, this paper presents a new indicator to measure the similarity between documents an...
متن کاملFine-Grained Transclusions of Multimedia Documents in HTML
Transclusions are a technique for virtually including existing content into new documents by reference to the original documents rather than by copying. In principle, transclusions are used in HTML for the inclusion of entire text documents, images, movies and similar media. The HTML specification only takes transclusions of entire documents into account, though. Hence it is not possible, for i...
متن کاملPrediction of user's trustworthiness in web-based social networks via text mining
In Social networks, users need a proper estimation of trust in others to be able to initialize reliable relationships. Some trust evaluation mechanisms have been offered, which use direct ratings to calculate or propagate trust values. However, in some web-based social networks where users only have binary relationships, there is no direct rating available. Therefore, a new method is required t...
متن کامل